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Abstract This article proposes and studies a link between statistics and the theory of Dirichlet 
forms used to compute errors. The error calculus based on Dirichlet forms is an extension 
of classical Gauss' approach to error propagation. The aim of this paper is to derive error 
structures from measurements. The links with Fisher's information lay the foundations of a 
strong connection with experiment. Here we show that this connection behaves well towards 
changes of variables and is related to the theory of asymptotic statistics. Finally the study of 
products permits to lay the premise of an infinite dimensional empirical error calculus. 

Mathematical subject classification (2000): 31C25, 47B25, 49Q12, 62F99, 62B10, 
65G99. 

Keywords: Error, sensitivity, Dirichlet forms, squared field operator, Cramer-Rao inequal- 
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1 Introduction 

1.1 Intuitive notion of error structures 

Let us consider a random quantity C (for example the concentration of some pollutant in a 
river) that can be measured by an experimental device which result exhibits an error denoted 
by AC. These quantities may be represented as random variables generally correlated (for 
higher pollution levels, the device becomes fuzzier). In this classical probabilistic approach we 
have to know the law of the pair (C, AC) or equivalently the law of C and the conditional law 
of AC given C. Thus, the study of error transmission is associated to the calculus of images 
of probability measures. Unfortunately, the knowledge of the law of AC given C by means of 
experiment is practically impossible. Now, let us look at the propagation of errors when the 
errors are small. For the sake of simplicity we adopt temporarily the following assumptions: 

• Only the conditional variance var[AC | C] is known. 

• The errors are small enough to allow the simplification usually performed by physicists: 
AC = sY where Y is a bounded random variable and e a size parameter. 

If / is C 3 (M, R) with bounded derivatives, supposing at first that the error is conditionally 
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centered E[AC \ C] — 0, Taylor's formula gives 

A(f(C)) = f(C) AC + l -f"{C){ACf + e 3 0(1) 

hence 

var[Af(C) \ C] = f' 2 {C)var[AC \ C] + e 3 0(1) 
E[A/(C) | C] = \f{C)var[Af{C) \ C] + s 3 0(1). 

In the same way, for another regular function h we have : 

var[A{h o f(C)) | C] = h' 2 (f{C))var[Af(C) | C] + e 3 0(1) (1) 

E[A(h o f(C)) | C] = h'if(0))E[Af(C) | C] + ^"(/(C))mr[A/(C) | C] + e 3 0(1). (2) 

These formulae of the propagation of variances and biases show that once a nonlinear func- 
tion has been applied, the error is no longer centered and the bias has the same order of 
magnitude as the variance. Through other applications this phenomenon persists. Moreover 
we can see that the calculus on the variances is a first order calculus and does not involve the 
biases whereas the calculus on the biases is of second order and involves the variances. This 
remark is fundamental: the error calculus on variances is necessarily the first step of an analysis 
of errors based on differential methods. It will be the main focus of our study. 

On the probability space associated to the observation of C, (K., Bor (R), law of C) ( where 
Bor(R) is the borelian cr-field of 1R), we introduce the operator T c called the quadratic error 
operator which provides for each function / the asymptotical conditional variance of the error 
on f(C): 

„ fir .w , ,. var[Af(C) I C — x] 
T c [f}(x) = hm JK J l • 

As the covariance operator in probability theory, T c polarizes into a bilinear operator: 

c covar[Af(C),Ag(C)\C = x} 

r u [f,g (x) = hm . 

e^0 e A 

Moreover if F is in C 2 (M 2 , M.) with bounded derivatives, we obtain a transport formula known 
as the Gauss' law of errors propagation ([B], Chap.l, Appendix): 

T c [F(f, g)} = Ff (/, g)T c [f] + F»(f, g)T c [g] + 2F{(f, g)F^f, g)T c [f, g]. (3) 

Now we can adopt an intuitive definition of an error structure: 
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An error structure is a probability space (W, W, m) equipped with a positive, symmetric, 
bilinear operator V acting on random variables and fulfilling a first order functional calculus on 
regular functions: 

T[F(h, ...,/„)] = F 'iUu ■■■> fn)Fj(f U . . . , UMfi, /,•]. 

If : R — > R is a regular mapping, this definition is preserved by image: we can equip the 
image space (R, £?or(R), law of 0(C)) with the quadratic error operator associated to the 
observation of 0(C). We have the following fundamental relation 

r*°>[/](z) = E{T c {f(<P)](C) I 0(C) = s]. (4) 

When we observe a two-dimensional quantity C = (Ci,C2) with erroneous components 
modelled with two error structures (R, Bor(R), law of C u Y Cl ) and (R, Sor(R), law of C 2 ,r C2 ), 
if (Ci, ACi) is independent of (C2, AC2) we need to define an error structure (R 2 ,B(R 2 ), law 
of Ci Cg> law of C 2 , T Cllg>C2 ) such that Y Cl ® C2 expresses a summation of errors component per 
component. Indeed, if F : R 2 — > R is regular, from the independence hypothesis it follows 

var[A(F(C u C 2 )) | (C 1; C 2 )] = if (C 1; C 2 )wr[AC a | d] + F 2 ' 2 (C l5 C 2 )wr[AC 2 | C 2 ] + e 3 0(l) 

thus 

r^^Cx.y) = r Ci [F(.,2/)] + r C2 [F(x,.)]. (5) 

The preceding intuitive considerations lead to the following rigorous mathematical frame- 
work. 

1.2 An extension tool 

Now we present an axiomatic extension of the preceding notion of error structures using the 
language of Dirichlet forms. It gives a powerful tool easy to handle in error calculations and 
sensitivity analysis. As noticed above, we limit ourselves to a first order calculus which is al- 
ready significant in most of applications. We refer to [0] for a calculus on biases involving the 
infinitesimal generator associated to the underlying Dirichlet form. This error calculus based 
on Dirichlet forms lies between the probabilistic approach (errors are supposed to be random 
variables) and the deterministic one (dealing with infinitely small deterministic errors to use 
differential calculus). 

From now on, an error structure is a term (W, W, m, D, T) where (W, W, m) is a probability 
space, © is a dense vector subspace of L 2 (m) and T is a positive symmetric bilinear map from 
© x © into L\m) fulfilling: 

1) the functional calculus of class C 1 n Lip i.e. if U = (Ui, . . . , U n ) G D n , V = (Vi, . . . , V p ) e 
W, F e C^R^R) n Lip = {C 1 and Lipschitz} and G G C X (R P ,R) n Lip then, 

(F(U 1 ,...,U n ),G(V 1 ,...,V p )) GD 2 
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and 

T[F(U U . . . , U n ),G(Vu ■•-,V p )]=Y t F^G'^m, 

2) 1 G B (this implies = 0), 

3) the bilinear form £ [F, G] = | J T[F, G]dm defined on D x D is closed i.e. D is complete 
under the norm of the graph 

II • ||*= (II • \\l H m) +£[.])'• 

We always write T[F] for T[F, F] and £[F] for £[F, F]. 



This notion is derived from the theory of Dirichlet forms ([3] Ch.l,[9],[T3]). It is a natural 
extension of the classical Gauss approach ([!]) and it seems to be a good way to study the 
propagation of errors and the sensitivity to changes of parameters in physical and financial 
models (@],[5],[E]). 

The condition 1) is similar to the Gauss' law of small errors propagation (3). For U = 
(U\, . . . , U n ) G D n , the intuitive meaning of the matrix T[U] = [T[Ui, £^]]i<i,j<n is the variance- 

covariance of the error on U ([6] Ch.l). Implicitly, we still suppose that the error is infinitely 
small although it is not mentioned in the notation. It is as if we had an infinitely small unit to 
measure errors that was fixed in the whole problem. Then, the hypothesis 3) is added to the 
heuristic definition and can be seen as a coherence principle. In fact, if the random variables 
(X n ) n( z^ and X are in D, if X n — > X in L 2 (m) and (X n , error on X n ) converges in a suitable 
sense, it converges necessarily to the pair (X, error on X). 

From the hypotheses mentioned above, £ is a local Dirichlet form and T its associated 
squared field operator. The domain D is preserved by Lipschitz functions: if F : M n — > R is a 
contraction in the following sense 

n 

\F(x)-F(y)\<J2\xi-Vi\ 

8=1 

then for U = (U 1} . . . , U n ) G W one has F(U) G D and 

n 

r[F(u 1 ,...,u n p<J2m} 1 >- 

i=l 

We would like to emphasize that the closedness property is the key stone of our approach. It 
plays the same role as the a-additivity in probability theory and permits to compute the errors 
on functions known as limits of simpler objects. 

The operations of taking images by mapping (definition 3.1.2) and making countable prod- 
ucts (definition 5.0.8) naturally provide error structures on spaces of stochastic processes ([3] 
Ch.6). 
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Since a probability space (W, W, m) can be known thanks to statistical experiments, we 
raise the problem of the empirical identification of an error structure. In the same way as the 
a-additivity of m on W could not result from experiments but is a fundamental mathematical 
hypothesis, our error structure will have to verify the closedness property 3) (This cannot be 
deduced from observation). Thus let 9 be a parameter taking its values in an open set C M. d . 
It is frequently useful to treat 9 as the realization of a random variable V : (O, A, P) — > G with 
a known distribution p chosen by combining experience with convenience ([13] p. 225). Let X be 
a random variable defined on the probability space (Q, A, IP) with values in a measurable space 
{E,T). Let us denote by Pq the conditional law of X given V = 9. Classically, to estimate 9 
we may use the statistical model {Pe)eee generated by the observations of X. Here we want to 
equip 9 with an error structure 

S v = (Q,B(Q),p,B v ,T v ) (6) 

where T v will express the precision of our knowledge on 9. Our approach is to consider T v 
as the inverse of the Fisher matrix which is an accuracy measure for regular statistical models 
(see [8]). We will study the behavior of this identification through changes of variables and 
products to show its remarkable stability. 

2 The Cramer- Rao Inequality (C.R.I.) and the Funda- 
mental Identification (F.I.) 

2.1 Regular models. 

From now on (., .) will denote the usual scalar product on M. d and || . || its associated norm. We 
suppose that (Pe)eee satisfies the conditions of regular models ([IT] p. 65): 

(a) The measures Pq are absolutely continuous with respect to a cr-finite measure p and 
^ = /(-,f)>0. 

(b) 9 — > f(x, 9) is continuous for /x-almost all x. 

(c) We set g(x, 6) = ^f(x,9). There exists <p : E x Q ^ R d such that V 9 e 6, 

J || 4>{x, 9) || 2 dp(x) < oo 

and 

J \g{x,9 + h)-g{x,9) - (<p(x,6),h)\ 2 dp(x) = o{\\ h f). 

thus the positive semi-definite matrix J(9) = 4 J <f)(x,9)(f){x,9) t dp(x) is defined as the 
Fisher information matrix of our model. 

(d) 9 — > 9) is continuous in L 2 (/j,). 
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(e) The model is identifiable: 9 —* Pg is injective. 



Remarks A: i) There exists several definitions of regular models. Here we use a notion 
taken from [TTJ where the conditions are quite general. These hypotheses are made to allow a 
differentiation under integrals which is needed for the proof of the Cramer-Rao inequality. We 
can found in [TJ another definition using the classical differential calculus and supposing that 
J is continuous when it is a simple consequence of d). 

ii) The assumption c) is a condition of differentiability in quadratic mean in L 2 (//). More- 
over, if we assume that 9 — > /(., 9) is differentiable in the classical sense then <j)(x, 9) = 



and we obtain the following expression of the so-called Fisher information matrix 



J{9) 



df(x,0) df(x,0) 

/(*,«) Mx) 



0<i,j<d 



To establish the differentiability in quadratic mean, one often proceeds by showing classical 
differentiability and equi-integrability (see [7]. [To]). 

iii) Identifiability is a purely statistical hypothesis. Intuitively, it means that the model can 
distinguish two different values of the parameter 9' ^ 9" if and only if Pgi ^ Pgn. In this case, 
if independent experiments are available, we have an infinite family of independent variables 
with the same law Pg denoted by Z e = (Xf ) ieN and for 9' ^ 9", the laws of the processes Z 9 
and Z e are mutually singular. Thus, 9' and 9" are perfectly identified thanks to experiment. □ 



2.2 Cramer-Rao Inequality 

Theorem 2.2.1 ([TTJ p. 73) Let if) : M. d — > R m be differentiable and (Pg)g^e be a regular model 
with W9 G det(J(9)) ^ 0. IfT(X) is an unbiased estimator of ip(9) such that E[T(X) 2 | V = 
9] is locally bounded in 9 then 

e[(t(x) - mmx) - my \v = 9\> ^.r ww- 

where > is the order relation between symmetric matrices defined by the cone of positive sym- 
metric ones. 



Remark B: An estimator T(X) fulfilling the hypotheses of the preceding theorem is said 
to be a regular unbiased estimator of ip{9). □ 

Now, up to the end, we suppose that the Fisher information matrix is regular. Thus, the 
Cramer-Rao inequality gives a bound of estimation for the quadratic risk. Let us have a look on 
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the error structure (6) we want to determine. If the components of identity are in O^, according 
to the functional calculus, we have for F £ Lip 1 (Q) = {F £ C 1 (6,1R) and Lipschitz}, 

T V [F] = (VF) 4 r y [/d](VF) 

where the matrix T v [Id](9) represents the error of estimation on V given V = 9. Since T v 

takes its significance from a calculus on variances, the Cramer-Rao inequality leads us to state 
the fundamental identification 

T v [Id] = J' 1 . (F.I) 

As well as the statistical identification of a probability space presupposes the cr-additivity of 
the measure, we want to determine an error structure deriving from experiment in which S v 
is a closed form. According to the fundamental identification we make the following assumption: 

Hypothesis (E): From now on, we suppose the existence of a dense vector subspace of 
L 2 (p) denoted by D y and the existence of an operator T v fulfilling conditions 1), 2) and 3) 
such that Lip l {&) C B v and, for all F in Lip 1 (6), T V [F] = F'J-^F')*. Moreover, as D y may 
not be uniquely defined, we take it minimal for inclusion, which implies the density of Lip l {Q) 
in 3 V for the norm || . ||gv. 

This hypothesis dictates conditions on p and J -1 which are often fulfilled as seen in the 
following proposition (see also [9]): 



d 

Proposition 2.2.2 a) Let be a bounded open set ofM. d of the form = n]^o>^i[ where the 

i=l 

9j are real numbers such that 9\ > 9 l . We shall assume that p is a probability measure which 
is absolutely continuous with respect to the Lebesgue measure on with a positive density q in 
Lip 1 (Q). Suppose that the model {Pe)eee can be extended to a regular model on an open set 0' 
such that C 0'. Then, hypothesis E is fulfilled. 

b) When = R, if we assume that J e 9 2 dp(9) < oo and that -j belongs to F x (p) ; the 
hypothesis E is equivalent to the conditions of Hamza theorem (JM/ p. 105). 



Proof: a) Let (F n ) N be a sequence in Lip 1 ^®) such that F n — ► in L 2 (p) and T v [F n — F m ] — > 
in L\p) where Y v : Lip\Q) -> L^p) is well-defined by T V [F] = F'J-^F')*. If we show that 
T y [F n ] — > in F x (p), the conclusion follows according to [9] p. 4. 

One defines the mapping $: 



$ : 



6x5 d i — ► R* + 



where Sd is the unit sphere of R and where the coefficients of J~ l are denoted by aij . 

The function $ is continuous on a compact set, thus there exists 5,5' > such that 5 < 
$ < 5'. ft implies 

5\VF n - VF m \ 2 < T v [F n - F m ] < 5'\VF n - VF m | 2 . (*) 

Hence, VF n is a cauchy sequence in L 2 (p;R d ) and there is a function G = (Gi, . . . , Gd) in 
L 2 {p- R d ) satisfying for all i G {1, . . . ,n} diF n — > G t in L 2 (p). 

Let 4> be a function in C|?(©) = {F G C°°(©,R) with compact support}. One notices that 
(f>, q, F n are Lipschitz and can be extended to 0. Thus, by integration by parts formula we 
obtain 

- / F n {d4)qd9 = / {d { F n )(f>qdB+ [ F^d^dO 
Je Je Je 

and by passing to the limit, it follows that V0 G C|?(G), 

/ Gi^gde = 
Je 

thus Gj = 0. We can conclude using the inequality (*). 

b) Hamza theorem gives necessary and sufficient conditions for the existence of an error 
structure S = (R, B(M),p, D, T) such that Cf (R) C D and = ^ on Cf(R). 

Let (-F n ) ng N be a sequence in C|?(R) with the same Lipschitz constant 1 such that F n — > Jd 
everywhere with Vn < |Jd| and F' n — > 1 everywhere. Using the dominated convergence 
theorem and the closedness of T we obtain that Id G D, hence Lip 1 (M) C © and = ^ for 
F G Lzp 1 (R). The result follows naturally. □ 

Remarks C: i) The statistical situation with a constant information matrix is often en- 
countered in classical parametric models (see [13J): Location family, Normal models with fixed 
coefficient of variation, Logistic model, Scale parameter. In this case the condition of extension 
of the model (Pe)eee can De removed in a). 

ii) The operator T v is bilinear, ft is possible to introduce a new operator, the gradient, 
denoted by V v , which can be seen as a signed and linear version of the standard deviation of 
the error and satisfies VF G V> v 

T V [F] = {V V [F},V V [F}). 

Since the error structure S v is defined on a finite dimensional space it is easy to construct 
V y putting 

v f B v ^ L 2 (p;R d ) \ 
V ' V F — R{F'Y J 

where R is the square root of J -1 . The gradient fulfills the classical differentiation chain rule. 

iii) We can notice that the fundamental identification gives, without other hypotheses, a 
second order calculus with variances and biases as mentioned in the introduction. In fact, we 
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can associate to the Dirichlet form £ v a unique self adjoint operator A v (see [3], [9]), called the 
infinitesimal generator. It has a domain D(A V ) included in and it takes its values in L l (p). 
Moreover we have 

A V [F{U)\ = F'(U)A V [U] + ]-F"{U)Y v [U] 

when U G D(A V ), T V [U] G L 2 (p) and F : 9 — > K is a function of class C 2 with bounded deriva- 
tives. Thus, the preceding formula expresses the propagation of the conditional expectation of 
the error in the same way as (2). □ 

Now, we want to test the robustness of the fundamental identification by comparing its 
properties with the well-known behavior of the Fisher information in the classical framework 
of parametric estimation. 

3 Change of variables: the injective case. 

We are going to show the stability of the fundamental identification for regular changes of 
variables. 

3.1 The regular injective case. 

Definition 3.1.1 We suppose that ip : O — > IR d is injective of class C 1 PI Lip. This change of 
variables is said to be regular if det(ip' (x)) ^ for all x. 

From the local inversion theorem, it follows that ip is a C 1 -diffeomorphism on its image and 
■0(G) is an open set of M d . 

Now, we want to equip ip{®) with an error structure that expresses the intrinsic accuracy of 
our knowledge on tf)(9). There are two natural ways to proceed. 

3.1.1 From the estimation point of view. 

In the injective case, the change of variables is just a reparameterisation of the model. To 
estimate ip(9) we use the model (P^,-i( a ), a G ip(Q)). Since dP^-i( a \(x) = f(x,if)~ 1 (a))dp(x), 
we can see easily that this model is regular. Let us have a look on the error structure we obtain 
using the fundamental identification. The operator is defined on Lip 1 (tp(Q)) by 

r^[F](a) = (V a FY(J^ v \a)r\V a F) Va G ^(9) 

where J^ v ^ is the Fisher information matrix of the regular model (P^-iu), ae/)(e))- Moreover, 
as Va G "0(0), 

J^ v \a) = Wir'ia))- 1 ]* [J(il>-\a))] [^(o))" 1 ] 
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one has for F G Lip 1 (tp(Q)) 

T^[F](a) = iy a F)^\r\a))} [J^ia))]- 1 ^'(^(a))]' (V F). 

Using that ip is injective of class C 1 fl Lip, from hypothesis E it follows that the form £^( y ) 
defined on Lip 1 (ip(Q)) by 

£^0O[F] = ^ Y^ v \F]d^p 

is closable and we denote by B^( y ) the domain of its smallest closed extension. Thus, the error 
structure associated to the fundamental identification for the estimation of ip{9) is 

Remark D: When d — 1 one obtains 

J_ 



Hence, if ip is flat enough in 9, ip(9) can be estimated more accurately than 9. This property 
is intuitively coherent since a value 9' at a given small distance from 9 will lead to a smaller 
deviation of ip(9') from ip(9) the smaller the value of \ip'{9)\ is. □ 

3.1.2 From the error calculus point of view 



Among the advantages of the error calculus based on Dirichlet forms, let us emphasize here its 
practical flexibility. It is easy to define both the product of error structures and the image of 
an error structure by a mapping. The following definition is the rigorous formulation of the 
intuitive expression (4) which corresponded to a change of observation in our preliminary study 
of error calculus. 



Definition 3.1.2 Let S = (W, W, m, B, T) be an error structure and Y : W -> R d G B d such 
thatYjW^is an open set ofR d . Let us define By = {/ G L 2 (Y*m) | f{Y) G D} and for 
f G B Y , r Y [f](x) = E m [T[f(Y)] \ Y — x\. _ 

If we denote by By the closure of Lip 1 (Y(W)) in (By, || . ||^) and by Y Y the restriction of 
fy to By then 

faS = (Y(W),B(Y(W)),Y,m,B Y ,r Y ) 
is an error structure called the image structure of S by Y. 

Let us study the image of S v by ip which is another natural way to endow ip(Q) with an 
error structure. For F G Lip 1 (ip(Q)) one has, Va G Im(ip), 

E p [T v [F^)} | V = a] = E p [V(F(V0)' J' 1 V(F(^)) | V = «] 



10 



and 



T ^(v) [F]=T v [F] Ap ae 
Thus, and are equal on Lip l (i)(Q)). 

Using the density of Lip 1 (-0(6)) in (D^, || . ||gv), we have the following expected property: 

Proposition 3.1.3 The fundamental identification is preserved by the transformation if). In 
other terms: 

^S V = S^ V \ 

Remark E: Suppose we are studying the sensitivity of a physical or financial model de- 
pending on the parameter 9 to small random perturbation by using an error structure on B and 
the functional calculus for T to compute the propagation of errors on the outputs of the model. 
If the error structure is obtained by the Fisher information matrix of a statistical model as 
above, the preceding invariance result means that the accuracy on 9 has a physical significance, 
independently of mathematical repameterization. □ 

3.2 The non-regular injective case 

After the regular case studied in the preceding section, let us see what happens at a point 9 
such that ip'(9) is singular. First, we supposes that d = 1. 

Let a be equal to ifj(9 ) with 9 G 9 and ^'(^o) — 0. We can see easily that the model 
(P^-i( a ), a G "^(G))) possesses an irregularity at a . Intuitively, as far as estimation is concerned, 
this situation is not harmful because it induces a good approximation of (see Remark D). If 
we put J^ v \ao) = +oo it follows 

T^ v \ld)(a ) = — = = Tt(Id)(a ). 

In the general case, since J(9 ) is supposed to be definite positive, we can reduce simultane- 
ously ip'(9 ) and J(9 ) and work component per component. If if)'(9o) is singular, there exists 
eigenvectors for to the eigenvalue which correspond to directions of infinite information for 
J^ v \ao). The other eigendirections are dealt as in the regular case. 

We can see that the fundamental identification is still stable in this case. 

Remarks F: i) The concept of infinite information appears in asymptotic statistics where 
it expresses a faster convergence of the maximum likelihood estimator toward the parameter. 

ii) We have seen that, for injective changes of variables, the error structure obtained estimat- 
ing directly ip(9) coincides with the image by -0 of the structure associated to the estimation 
of 9. This phenomenon can be viewed as a sufficiency principle (well known for the Fisher 
information [llj, p. 70) because when ip is injective, Pg depends on 9 only through ip. 

iii) The proposition 3.1.3 is based on the simple relation between and J. This property 
of the Fisher information is not fulfilled for other types of information bound. For example, 
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for the bounds of Bhattacharaya type (see [TT]) (which involve higher derivatives and are more 
precise) it is impossible to obtain such a coherence property. The hypotheses of regular models 
are the good level of axiomatization for our study. □ 



4 The non injective case. 

We are now in a special situation: we have put in correspondence an error structure and a 
parametric model thanks to the Fisher information. But on one side (error structures) non- 
injective changes of variables are allowed (def 3.1.2) and on the other side (statistical models) 
they meet difficulties. We derive benefit of this remark to propose a new framework for the 
estimation of a parameter in this case which is directly linked with the notion of error structure. 

Here we suppose that ip is a function in Lip l {Q) not necessarily injective but such that 
is an open set of ~R d (in order to apply definition 3.1.2 with Y — i/j ). 

To estimate ip{9) the reparameterisation introduced in the previous section is meaningless. 
To avoid this problem, we give a new protocol. 

4.1 Estimation protocol of ip(9) when ip is not injective. 

To estimate 8 we use a regular model (Pe)eee such that hypothesis E is fulfilled. In this section 
the random variables X± . . . X n defined on (Q, A, P) with values in (E, J 7 ) will be, given V = 8, 
a n-sample of Pq. 

To estimate ip(6), it is natural to use the model (Q a )aev>(©) generated by the observation of 
Xi . . . X n given ip(V) = a. From the definition of the conditional expectation it follows that 

dQa{x) = E p [f(x, ■) \i> = a]dfi(x). 
In particular, we need a global knowledge of f(x, .) to perform Q a . 

Remarks G: i) When ip is injective, the preceding protocol coincides with the reparame- 
terisation. In this case a pointwise knowledge is sufficient. 

ii) The case of non-injective changes of parameters is often tackled in the literature on the 
following restrictive form. When d=l, we consider a point 6>o such that i/j'(9q) ^ 0. Since 
ip is in C 1 (0,IR), according to the local inversion theorem there exists 8™ m , 8™ ax such that 
ip : ]&Q An ,6 ! Q ax [—^ V , (]#o wn , 8™ ax {) is a C 1 -diffeomorphism with an inverse denoted by ig (which 
depends on 8 contrary to the injective case). 

If we suppose that previous observation leads us to believe that 8 is in ]9™ m ,9™ ax [, locally 
we are going back to the case processed in section 3. Thus, we set Va G r ip{]8'^ Lin , 8™ ax [), 

The quantity Jg o (a) is called the local Fisher information because it takes into account 
only one antecedent of a. 



12 



When we do not have any a priori information on 9, one has to use a concept which ex- 
presses the entire behavior of if). □ 



Since if) is non-injective, the model (Q a )aeip(&) ma y present irregularities. Thus, the Fisher 
information may be undefined. Moreover, even if it exists, the information matrix is not easy 
to perform. So we are going to show the relevance of error calculus in this case, showing that 
the operator is a substitute of the inverse of the Fisher information in the sense that it gives 
a simple bound of estimation and is linked to asymptotic statistics. 

4.2 as an estimation bound. 

To simplify, let us suppose that d—1. 

Using a regular parametric model to estimate 9, we have seen that for a regular unbiased 
estimator T(X) of if) (6), the Cramer- Rao inequality 

E[(T(X)-m) 2 \V = 6]>^ (7) 

gave a bound of the quadratic risk and lead to interpret J as the information on 9 contained 
in observation X. In the same way, when the estimators are built with the independent ob- 
servations (Xi, . . . X n ), it is easy to see that the additivity property of the Fisher information 
matrix ensures that 

E[(T(X u ...,X n )-m) 2 \V = 0\>^ 

ifW,[T(X 1 ,...X n )\V = 9]=i/j(6). 

Thus, conditioning (7) with respect to if) one has 

E[(T(X) - af | if)(V) = a]> E p [^- \ if) = a] = rj[/d](a) 



and r^jld] appears as a natural bound of the problem. Similarly, one obtains 

rx\id\(a) 



n 

Remark H: ^ can be seen as an additive information when independent observations are 
combined. □ 



4.3 Links with asymptotic statistics. 

For the sake of simplicity about the question of existence and unicity of the maximum likelihood 
estimator we suppose that for the model {Peje^e, for all n G N, for all (x±, . . . ,x n ) G E n , the 
equation 

n f> 

J2^logf( Xl ,9) = 

i=l 
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has a unique solution denoted by 8 n ( ) which is a maximum for the function 8 — > 

n 

Y[ f(xi, 8). In this section we assume that is a convex bounded subset of K (this could easily 



i=l 



be extended to any finite dimension). 

In order to show that TY, is the key stone of some asymptotic results, one requires preliminary 

knowledge concerning the convergence of the sequence of estimators (8 n (Xi, . . . , X n )) ngN . 
4.3.1 Convergence of the maximum likelihood estimator. 



We essentially refer the reader to [TO] . [TT] for the proof of the results exposed here and for 
complementary details. 

The asymptotic techniques used in this section can be easily extended to a more general 
framework than the case of experiments based on the observation of n-samples (especially for 
the applications to stochastic processes). These techniques are not based on the historical 
approach using Taylor's formula any more (see for example [T2] p. 469) but on large deviation 
tools. 

An important idea of Ibragimov and Has'minskii has been to study the likelihood ratio 



n a f\-^ii @ H — 7 = ) u 
— — Y n with u e U n<e = {u e E I 8 + -= e 9}. 
, . J{Xi,8) yjn 



Its asymptotic behavior is linked to that of the maximum likelihood estimator by the following 
inequality 

HVn{8 n -8) > H \V = 8) <F(sup Z nfi {u) > 1 | V = 8). 

\u\>H 

Furthermore this quantity is connected to the Hellinger's distance: 

E[Z*» | V = 8] = 1 - \riP r d \,, P?) 
where, for a given parametric model (Pq), the Hellinger's distance r is defined by 



r{P e ,Pe')= {Vf{x,8)-^f{x,8') ) 2 dfx(x). 



It is a measurement of the identifiability i.e the capacity of a model to distinguish two different 
values of the parameter 8. 

The following theorem gives sufficient conditions for the consistence of the maximum like- 
lihood estimator. 



Theorem 4.3.1 ([TT] p. 42) Let us suppose that 
1) W8, \/n, the function u — > Z n fi{u) is continuous 
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2) V0, 3M > 0, 3m > such that Wn 

sup \ Ul - u 2 \- 2 E[ \Zl e ( Ul ) - Zl e {u 2 )\ 2 \ V = 9]< M(l + R m ) 

\ui\<R,\u 2 \<R 

3) 3a > swc/i t/iat Wu G Vn 

Then, V6* , 3£> > 0, 36 > such that Ve > 0, for n sufficiently large, one has 

EJl ^ ^ fl i^ I V = 9] < Be' h£ \ 

pi y/n\e n (x 1 ,...,x n )—e\>E i j — 

Consequently we obtain the almost sure convergence of 9 n toward 9. 

Remarks I: i) We can notice that hypothesis 3) implies the identifiability of the model. 
This condition is necessary because one can't find consistent estimators for a non-identifiable 
model. 

ii) There exists a uniform extension of the preceding theorem: If K is a compact set included 
in and if hypotheses 2) and 3) are fulfilled uniformly for 9 e K then 3b(K) > 0, 3B(K) > 
such that Ve > 0, for large n, 

™P E P l 1 V*\r n -e l >JV = 9]<Be- b °\n 

The hypotheses of theorem 4.3.1 may appear restrictive, but the following result shows that 
they are satisfied for a large class of regular models. 

Proposition 4.3.2 ([11] p. 81) If Pe is a regular model fulfilling 

1) 0<inf J (9) < sup J (9) < oo 

e e 

2) V9, > 

inf r{P e ,P 0+u )>O 

u<EUi t e,\u\>6 

then the hypotheses of theorem 4.3.1 hold. 

From a practical point of view, the condition of local asymptotic normality introduced in 
the following theorem, yields a useful result for constructing confidence intervals. It possesses 
also a uniform version. 
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Theorem 4.3.3 ([TT] p. 185) We suppose that the hypotheses of theorem 4.3.1 are fulfilled. 
Moreover we assume that the model satisfies the local asymptotic normality condition introduced 
by Le Cam: for all 6, the sequence of stochastic processes (Z n g(u)) converges in the sense of 



Remarks J: i) The hypotheses of theorem 4.3.1 lead to the tightness of the process (Z nt g(u)) 
in the space of continuous functions vanishing at infinity. The pointwise convergence of this 
sequence becomes functional and gives 1). 

ii) The maximum likelihood estimator is asymptotically unbiased and achieves asymptoti- 
cally the bound of the Cramer Rao inequality. 

iii) Since J is continuous under the hypotheses of regular models, the construction of asymp- 
totic confidence intervals is done classically. □ 

Now, one of the most important property of regular models is the following: 

Proposition 4.3.4 ([11] p. 114) The condition of local asymptotic normality is fulfilled for reg- 
ular models. 

In the following section we used those asymptotic results to give a new interpretation of FX. 
4.3.2 as an asymptotic variance 

We are able to exhibit a consistent estimator in the problem of the direct estimation of 
ip(8) using the experiment generated by the observation of (X 1 , . . . , X n ), given ip(V) = a. The 
quantity will appear in the limit theorems associated to this statistical procedure. 

Proposition 4.3.5 Under the hypotheses of proposition 4.3.2 one has Va e 




e uA 2 J W u2 where A is a random 



where m p is the p-th moment of the law jV(0, jtW)- 



1) We > 



E[l 



|lK0n(Xi,...,X n ))-^(V)|>e 



^(y) = a] -»• 0. 



2) Given ijj(V) = a 



V^(^(£) - a) -» G, 



£(P) 



a 
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where G a is a random variable with the following density 

1 -x 2 J 

g(x,a) = E p [l v ,/^ ^^=e^ 72 " | ip = a] 
2tt^ 

with respect to Lebesgue measure onM. ( G a has a variance equal to rX[Id](a)). 



Proof: 1) We denote by C the Lipschitz constant of ip. 

According to Fubini theorem and by definition of the conditional expectation, V(xi, . . . , x n ) E 
(E) n ,\/a G ip(Q), 



is equal to 



E p 



[ / \^T n (v x ,...,x n ))-a\>Jt X ^ ■)■■■ f( X ^ -)M X l) ■ ■ ■ M X n) \lp = a]. 



But we have 



1 |V(^(x 1 ,...,x n ))-v(e)|> £ - 1 |Mx 1 ,...,x„)-e|>^ 

and the result follows by theorem 4.3.1 and dominated convergence theorem. 
2) When ip'(6 ) = 0, theorem 4.3.3 yields 

E ( 1 v / ^IV'(^(^iv-,^n))-^(e)|>e I V = ^o)rwo3 

and when ip'(0 o ) ^ 0, Slutsky's lemma (see [12] p. 86) gives that, given V = 9 , 



^(ifj(e n (x 1 ,...,x n ))-ij(e )) -> u o, 



*P ,2 (e ) 



If F is a bounded continuous function, using the same argument as in 1), one has that 
J F(^/n(ip(6 n ) - a))E p [/(xi, .) . . . f(x n , .) | ip = a]<fyi(xi) . . . d[i(x n ) 

is equal to 

E p [ J F{\fn{ip(9 n {x u . . . , x n )) - ip))f(xt, .) . . . f(x n , .)d^i(xi) . . . d/j,(x n ) \ ip = a] 

and the result comes by dominated convergence. □ 

Remarks K: i) When ip is injective, ip(0 n ) is the maximum likelihood estimator associated 
to the model (Q a )a^(e)- 

ii) Using the Borel-Cantelli theorem and the fact that ip is in Lzp 1 (0), we can extend the 
convergence in probability in 1) to an almost sure convergence. 
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iii) r^[/d] is a mean of the inverse of the local Fisher information. Let us simply show this 
on an example: we suppose that =] — 1, 1[ \ {0}, p(9) = q(9)d9 and ip(9) = 9 2 . 

If a e]0, 1[, this point has two antecedents for ip: 6 1 = v /ao with the local Fisher infor- 
mation jf^ V \a ) = an d 02 = _ v^o with J^ V \a ) = ^^y- A calculus of conditional 
expectation gives 

q(Qi)) i q(Q2) 



r;N(a ) 



q(0i) + q(0 2 ) 

which is none other than a barycenter weighted by p. 

iv) When p{9) = q(6)d9 with q continuous, we have similar results if we replace the maxi- 
mum likelihood estimator by the bayesian estimator associated to the quadratic loss function 
and the a priori law p. 

v) The estimation bound given in 4.2 becomes an asymptotic equality. □ 

In order to obtain a quadratic convergence for y/n(ip(9 n ) — a) ( allowing to approximate in 
this way r^[/<i] by Monte-Carlo methods) we have to reinforce the hypotheses of proposition 
4.3.5. 



Proposition 4.3.6 Let us suppose that the model (Pe)eeo can be extended in a regular model 
on an open set 0' such that C 0'. Moreover, if 

1) 0<inf J (9) < sup J {9) < oo 

6' e' 

2) > 

inf inf r(P e ,P e+u )>0 
fee' u eu lt g,\u\>5 

where U li0 = {u eR \ 9 + u e &}, then, Va G ^(9) 

E[n(if>(e n (X u ...,X n ))- a) 2 \ ip{V) = a] - F^[Id\(a). 
Proof: Conditions 1) and 2) lead to an uniform version of theorem 4.3.3: 

sup (E[n(9 n -9) 2 \V = 9]- J-) - 0. (8) 
see J[y) 

By Fubini theorem, 

E[n(iP(9 n ) - a) 2 \ ^{V) = a] 

is equal to 

E p[ J n(i)(9 n (x u x n )) - ii) 2 f{x u .) ... f(x n , .)dp{xi) . . . dp(x n ) \ if) = a]. 
Since ip is lipschitzian, it follows from (8) that 

A = J n(i>(9 n (xi, . . . ,£„)) - ip{9)) 2 f(x u 9) . . . f(x n ,9)dp(xi) . . .dp(x n ) 
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fulfills A < -j^y with J 1 G L l {p) and k £ M* + . We conclude thanks to the dominated conver- 
gence theorem using that 

E[n(9 n - Of | V = 9] 



implies 



Ein^eZym) 2 1 v = 9] - □ 



4.3.3 Comments and perspectives. 



From the hypotheses made on the model (Pg)g e Q, we are able to give a bound concerning the 
direct estimation of ip(9), using the experiment generated by the observation of (X±, . . . ,X n ) 
given ip(V) = a. A question naturally arises: what happens when the model (Q a )aeip(&) is 
sufficiently regular to define its Fisher information matrix J^( y )? One has another estimation 
bound that appears in some limits theorems associated to the estimation of a = tjj(9) by means 
of a n-sample of Q a . 

When if) is injective it is easy to show that those bounds coincide, but it is not generally 
the case as we can see on the following example. 
Suppose 

- =] — 1; 1[ \{0}, p is distributed as the normalized uniform law on 6 

— (x — 6)^ 

- dPg(x) = f(x,9)dp(x) = ^=e 2 dx 

- ifj{9) = 9 2 . 

The model {Pe)eae is regular and fulfills the assumptions of proposition 2.2.2 a). From the 
definition of the conditional expectation, we obtain for a e]0, 1[ 

d Qa (x) = ^ ^ dx = h(x, a)dx. 

As the function a — > h(x, a) is in C 1 (]0, 1[, M) and that, according to the dominated convergence 
theorem, a — >• J ^^'^ dx is continuous, using the method of [f 3] p. 95, one shows that the 
model (Qa) is regular. Moreover we have 

Tl[Id]{a)=Aa. 

In order to compare rV[/ci] and J^ v "> we need the following lemma. 



Lemma 4.3.7 Suppose that p(x,9)dfj,(x) and r(x,9)dfj,(x) are two regular models on such 
that the function 9 — > (p(x, 9),r(x, 9)) is differentiable. If we put s(x, 9) = p(x, 9) + r(x, 9) then 

J ^dp(x) < J ^-dp(x) + J ydp(x). (9) 
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Proof: We set s = y/s, p = yfp, r = y/r, inequality (9) becomes 

s 2 dp(x) < / p' 2 dji(x) + / f' 2 dp(x). (10) 



It is easy to show that 

(lO)-^p r +r p >2pprr p — a.e 
and (9) follows with equality if and only if pf' = p'r.O 



Thus we have 



1 



>Y v Ald\. (11) 



Ji>(v) 



Hence, in this situation, we can see that error calculus gives a more precise bound. At 
present, we are not able to exhibit an example where (11) is contradicted. 

5 Product structures. 

First of all, we recall the definition of the product of two error structures (see [3] p. 200). 



Definition 5.0.8 If S{ = (Wi, Wi, rrii, D i; Tj) (i—1,2) are two error structures, the product, 
denoted by Si ® S 2 , is define as the structure (Wi x W 2 , Wi <8> W2, mi (8> m 2 , D, T) with 



D = {/ G L 2 {mi ® m 2 )| /or m 2 — almost every y f(.,y) G D x 

/or mi — almost every x f(x, .) G B 2 

r[/](x,2/)dmi(a:)d77i2(y) < 00} 



and 

r[/](x, y ) = r 1 [/(.,y)](x) + r 2 [/(x,.)](y). 



Here we are interested in the evaluation of a parameter = (81,62) where 8\ and 8 2 are 
supposed to be independent i.e. V\ and V2 are independent random variables. Let us denote by 
V = (Vi, V2) '■ (Q,A,W) —>■ 0i x 02 the realization of the parameter 8. The law of the random 
variables V, denoted by p, fulfills: 

dp(8) = d Pl (d 1 )dp 2 (d 2 ). 
To estimate 81 [resp.^2] we choose the following regular parametric model: 
dP 6l = f{x, 81) dp(x) [resp. dQ d2 = g(y, 8 2 ) dv(y)) 
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with a regular Fisher information matrix Ji(#i) [resp. 72(6*2)] such that hypothesis E is fulfilled. 

Let us consider a random variable X [resp. Z] with a conditional law given V\ = 6\ [resp. 
V 2 = 9 2 ] having the density: 

/(x,0i) dfi(x) [resp. g(y,9 2 ) dv(y)\. 
We suppose that (X, Vi) and (Z, V 2 ) are independent. 

Remark L: We are in the situation where the pairs (parameter, observation) are indepen- 
dent. In terms of errors, this independence has to be linked with (5) which is the intuitive 
meaning of the preceding definition of product structures. □ 

To estimate 9, it is natural to use the conditional law of (X, Z) given V — (9±, 9 2 ) denoted 
by Re 1 ,e 2 - From these hypotheses, it comes that 

dR ei ,e 2 = f(x,0i) g(y,0 2 ) d/j(x) dv{y). 

Thus, we obtain for this model the following Fisher information matrix 

Ji{0i) 
J 2 {9 2 ) 



and for F G Lip 1 (Q 1 x 6 2 ), 



Y V [F]{9M 



[F[{9^9 2 )f , [F^9 2 )\ 



Ji(0i) 



+ 



J2(0 2 ) 



Then we have the following proposition: 



Proposition 5.0.9 1) S v = S Vl ® S V2 

2) If i)\ and ip 2 are regular changes of variables then 



Proof: 1) Let us notice that Lip 1 (©i x 62) is included in the domain of the product structure 

S Vl ®S v \ 

Moreover, from the expression of the information matrix, for F G Lip 1 (61 x 9 2 ) it follows 
that 

S V [F] = J S v i[F(.,y)]dp 2 (y) + J £ v *[F(x, .)}d Pl (x). 

Thus || . \\ £ v coincides on Lip l {Q\ x 62) with the norm associated to the product structure. 
Hence, we can deduce that the hypothesis E is fulfilled for the model (-R6>i,0 2 )eixe 2 : S v is 
well-defined. 
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Furthermore, since Lip 1 (Qi x 2 ) is dense in (1D) V , || . \\gv), B> v is included in the domain of 
the product structure and the two squared field operators coincide on D y . 

p 

For the other inclusion, we use the fact that the functions of the form F = J~] U9i with 

i=i 

fi e 3 Vl and gt G D^ 2 are dense in the domain of the product structure for the associated norm 
(see [3] p. 201) and belong to D y as easily seen using the closedness of the forms £ Vl and S V2 . 
2) The equality comes from 1) and section 3. □ 

Remarks M: i) The preceding results extended obviously to n-tuple. 
ii) We can notice that this property expresses the additive property of the Fisher informa- 
tion for independent experiments. □ 

Since it is easy to build infinite products of error structures (see [3] , [B] , [2] ) , we are able to 
obtain an empirical error calculus associated to the estimation of the parameters of the type 
= (^i)ieN working component per component. 



6 The choice of an a priori law p. 

In the preceding sections, the choice of an a priori law on the space of parameters is left 
to the practitioner as in the bayesian analysis. The determination of our error structure S v 
can appear, to some degree, incomplete. We are going to show that, once a regular parametric 
model is chosen, a natural probability measure becomes apparent: the Jeffreys prior (see [12] 
p. 490). This probability is well known in bayesian analysis. Moreover, it possesses a remarkable 
stability concerning error calculus: it is invariant under reparameterization and compatible with 
the notion of product. 

Let (Pe)eee be a regular model such that 

K = I y/det(J(0))d6 < oo. 
Je 

We can define on the following probability measure 

v , Iey/det(J(0))d9 
p (de) = — 

called the Jeffreys prior induced by the model (Pe)eeo- It is often used in bayesian analysis 
for its invariance under reparameterisation. Moreover it is the prior measure which has the 
smallest influence on the posterior measure in the sense of the asymptotic Shannon information 
(see p3]). In term of error calculus its properties are summarized in the following proposition: 

Proposition 6.0.3.1 a) If ip : — > M. d is a regular change of variables 
b) In the framework of section 5 

p 0W) = p vi $ p n 
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Proof: Obvious using the classical properties of the Fisher information. 



□ 



Finally, with suitable hypotheses, the Jeffreys prior may be seen as the invariant measure 
of the generator associated to the induced infinitesimal perturbation in the convergence of the 
maximum likelihood estimator. 

7 Conclusion. 

Through statistical experiments, we have seen that the fundamental identification gave an error 
structure intrinsically linked to the observed physical phenomenon. The remarkable robustness 
of this identification, regarding injective changes of variables and products, yields a particularly 
efficient tool for finite dimensional estimation. 

The existence of such an error structure built from the parametric model allows to prop- 
agate the accuracy through calculations performed with the parameter thanks to a coherent 
specific differential calculus (property 1 of T). Moreover error calculus provides a natural frame- 
work concerning the study of non-injective mapping. A possible extension will be to generalize 
such an experimental protocol when J is singular and also to explore more precisely the con- 
nections between Dirichlet forms and asymptotic statistics. Finally, we wonder whether the 
semi-parametric and non-parametric estimation theories (see [TH]) could lay the foundation of 
an infinite dimensional identification in order to get T on the Wiener space, using a direct 
functional reasoning instead of a component per component argument as above. 
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